专利摘要:
This is a device for decoding video data that is configured to perform interpolation filtering using an N-lead filter to generate an interpolated search space for a first block of video data; obtaining a first predictive block in the interpolated search space; determining that a second block of video data is encoded using a bidirectional interpredicting mode and a bidirectional optical streaming (BIO) process; performing an interprediction process for the second block of video data using the bidirectional interprediction mode to determine a second predictive block; perform the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block, where a number of reference samples used to calculate intermediate values for BIO deviations is limited to an integer sample region of ( W+N-1)x(H+N-1), where W and H correspond to a width and height of the second block in integer samples.
公开号:BR112019026775A2
申请号:R112019026775-1
申请日:2018-06-22
公开日:2020-06-30
发明作者:Hsiao-Chiang Chuang;Jianle Chen;Kai Zhang;Xiang Li;Marta Karczewicz;Yi-Wen Chen;Wei-Jung Chien
申请人:Qualcomm Incorporated;
IPC主号:
专利说明:

[0001] [0001] This Application claims the benefit of Provisional Patent Application US 62/524,398, filed June 23, 2017, the entire contents of which are incorporated herein by reference. FIELD OF TECHNIQUE
[0002] [0002] This revelation the video encoding and video decoding. BACKGROUND
[0003] [0003] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablets, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cell phones or satellite radio, so-called “smart phones”, teleconferencing devices video, video streaming devices, and the like. The video coding techniques of deployment in digital video devices, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Video Coding (HVEC) and extensions of such standards. Video devices can transmit, receive, encode, decode and/or store digital video information more efficiently by implementing such video encryption techniques.
[0004] [0004] Video encryption techniques include spatial prediction (intrafiguration) and/or temporal prediction (interfiguration) to reduce or remove inherent redundancy in video sequences. For block-based video encryption, a video slice (e.g., a video frame or a portion of a video frame) can be divided into video blocks, which may also be referred to as tree blocks, video units, cryptographic nodes (CUs) and/or crypto nodes. Video blocks in an intracoded slice (I) of a picture can be encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an intercrypted slice (P or B) of a picture can use spatial prediction against reference samples in neighboring blocks in the same picture or temporal prediction against reference samples in other reference pictures. Figurations may be referred to as frames, and reference figures may be referred to as frames of reference.
[0005] [0005] Spatial or temporal prediction results in a predictive block for a block to be encrypted. Residual data represents pixel differences between the original block to be encrypted and the predictive block. An intercrypted block is encoded according to a motion vector that points to a block of reference samples that form the predictive block, and the residual data that indicates the difference between the encoded block and the predictive block. An intra-encrypted block is encoded according to an intra-encryption mode and residual data. For further compression, the residual data can be transformed from the pixel domain into a transform domain, resulting in residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a bidirectional array, can be swept to produce a one-dimensional vector of transform coefficients, and entropy encryption can be applied to achieve even more compression. SUMMARY
[0006] [0006] In general, the techniques of this disclosure refer to enhancements to bidirectional optical stream (BIO) video encryption techniques. More specifically, the techniques in this disclosure pertain to BIO motion vector interprediction and reconstruction for video encryption and for BIO-based interprediction refinement.
[0007] [0007] According to one example, a method of decoding video data includes determining that a first block of video data is encoded using an interprediction mode; performing interpolation filtering using an N-lead filter to generate an interpolated search space, where N is an integer and corresponds to a lead number in the N-lead filter; obtaining a first predictive block for the first block of video data in the interpolated search space; determining that a second block of video data is encoded using a bidirectional interprediction mode; determining that the second block of video data is encoded using a bidirectional optical stream (BIO) process; performing an interprediction process for the second block of video data using the bidirectional interprediction mode to determine a second predictive block; perform the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block, where a number of reference samples used to calculate intermediate values for BIO deviations is limited to an integer sample region of ( W+N-1)x(H+N-1), where W corresponds to a width of the second block in integer samples, and H corresponds to a height of the second block in integer samples; and output the BIO-refined version of the second predictive block.
[0008] [0008] According to another example, a device for decoding video data includes a memory configured to store the video data and one or more processors configured to determine that a first block of video data is encoded using a interprediction mode; performing interpolation filtering using an N-lead filter to generate an interpolated search space, where N is an integer and corresponds to a lead number in the N-lead filter; obtaining a first predictive block for the first block of video data in the interpolated search space; determining that a second block of video data is encoded using a bidirectional interprediction mode; determining that the second block of video data is encoded using a bidirectional optical stream (BIO) process; performing an interprediction process for the second block of video data using the bidirectional interprediction mode to determine a second predictive block; perform the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block, where a number of reference samples used to calculate intermediate values for BIO deviations is limited to an integer sample region of ( W+N-1)x(H+N-1), where W corresponds to a width of the second block in integer samples, and H corresponds to a height of the second block in integer samples; and output the BIO-refined version of the second predictive block.
[0009] [0009] According to another example, a computer-readable storage medium stores instructions that, when executed by one or more processors, cause the one or more processors to determine that a first block of video data is encoded with the use of an interprediction mode; perform interpolation filtering using an N-lead filter to generate an interpolated search space, where N is an integer and corresponds to a lead number in the N-lead filter; get a first predictive block for the first block of video data in the interpolated search space; determine that a second block of video data is encoded using a bidirectional interprediction mode; determine that the second block of video data is encoded using a bidirectional optical stream (BIO) process; perform an interprediction process for the second block of video data using bidirectional interpredicting mode to determine a second predictive block; perform the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block, where a number of reference samples used to calculate intermediate values for BIO deviations is limited to an integer sample region of ( W+N-1)x(H+N-1), where W corresponds to a width of the second block in integer samples, and H corresponds to a height of the second block in integer samples; and output the BIO-refined version of the second predictive block.
[0010] [0010] According to another example, a device for decoding video data includes means for determining that a first block of video data is encoded using an interprediction mode; means for performing interpolation filtering using an N-lead filter to generate an interpolated search space, where N is an integer and corresponds to a lead number in the N-lead filter; means for obtaining a first predictive block for the first block of video data in the interpolated search space; means for determining that a second block of video data is encoded using a bidirectional interprediction mode; means for determining that the second block of video data is encoded using a bidirectional optical stream (BIO) process; means for performing an interprediction process for the second block of video data using the bidirectional interpredicting mode to determine a second predictive block; means for performing the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block, wherein a number of reference samples used to calculate intermediate values for BIO deviations is limited to a region of integer samples of (W+N-1)x(H+N-1), where W corresponds to a width of the second block in integer samples, and H corresponds to a height of the second block in integer samples; and means for outputting the BIO-refined version of the second predictive block.
[0011] [0011] The details of the one or more examples of the disclosure are set out in the accompanying drawings and in the description below. Other features, objects and advantages will be evident from the description, drawings and claims. BRIEF DESCRIPTION OF THE DRAWINGS
[0012] [0012] Figure 1 is a block diagram illustrating an exemplary video encoding and decoding system that can utilize techniques for bidirectional optical flow (BIO).
[0013] [0013] Figure 2 is a conceptual diagram illustrating an example of one-way motion estimation (ME) with a block compatibility algorithm (BMA) performed for motion compensated frame rate upconversion (MC-FRUC).
[0014] [0014] Figure 3 is a conceptual diagram illustrating an example of bilateral ME as a BMA performed for MC-FRUC.
[0015] [0015] Figure 4 shows an example of an optical flow path.
[0016] [0016] Figure 5 shows a BIO example for an 8x4 block.
[0017] [0017] Figure 6 shows an example of BIO modified for an 8x4 block.
[0018] [0018] Figures 7A and 7B show examples of sub-blocks where Overlaid Block Motion Compensation (OBMC) applies.
[0019] [0019] Figures 8A to 8D show examples of OBMC weights.
[0020] [0020] Figure 9 shows an example of a MC process.
[0021] [0021] Figure 10 shows an example of BIO application.
[0022] [0022] Figure 11 shows an example of BIO application.
[0023] [0023] Figure 12 shows an example of BIO application.
[0024] [0024] Figure 13 shows an example of BIO application.
[0025] [0025] Figure 14 shows an example of BIO application.
[0026] [0026] Figure 15 shows an example of BIO application.
[0027] [0027] Figure 16 shows an illustration of the pixels used to apply a BIO process.
[0028] [0028] Figure 17 shows an illustration of BIO derived from Ref0/Ref1 and applied to MC P0/P1 predictors.
[0029] [0029] Figure 18 shows an illustration of BIO derived from/applied to MC P0/P1 predictors.
[0030] [0030] Figure 19 shows an illustration of
[0031] [0031] Figure 20 shows a BIO illustration derived and applied to MC P0/P1 predictors with parallel processing of an OBMC process and a BIO process.
[0032] [0032] Figure 21 is a block diagram illustrating an example of a video encoder.
[0033] [0033] Figure 22 is a block diagram illustrating an example of a video decoder.
[0034] [0034] Figure 23 is a flowchart illustrating an exemplary method of decoding video data in accordance with techniques described in this disclosure. DETAILED DESCRIPTION
[0035] [0035] In general, the techniques of this disclosure refer to enhancements to bidirectional optical stream (BIO) video encryption techniques. More specifically, the techniques in this disclosure pertain to BIO motion vector interprediction and reconstruction for video encryption and for BIO-based interprediction refinement. BIO can be applied during motion compensation. In general, BIO is used to modify a motion vector on a per-pixel basis for a current block so that the pixels of the current block are predicted using corresponding offset values applied to the motion vector. The various techniques of this disclosure can be applied, alone or in combination, to determine when and if BIO is performed when predicting blocks of video data, for example, during motion compensation. In one example, the techniques of this disclosure include performing BIO when the motion vectors used to interpredict a block of video data from a current figure relative to reference blocks from reference figures in a common prediction direction is proportional, or almost proportional, to the temporal distances between the current figuration and the reference figurations, and to prevent the BIO from being carried out in another way. In some examples, BIO can be performed only when the block is not in a lighting change region. Furthermore, the techniques by which BIO is performed generally include calculating gradients for the blocks. According to the techniques of this disclosure, the gradients can be modified according to the temporal distances between the current figuration and the reference figurations.
[0036] [0036] The techniques of this disclosure can be applied to any existing video codec, such as those conforming to ITU-T H.264/AVC (Advanced Video Coding) or High Efficiency Video Coding (HEVC), also referred to as ITU- T H.265. H.264 is described in the International Telecommunication Union, “Advanced video coding for generic audiovisual services”, H SERIES: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services – Coding of moving video, H.264, June 2011, and H. 265 is described in the International Telecommunication Union, “High efficiency video coding”, H SERIES: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services – Coding of moving video, April 2015. The techniques of this disclosure can also be applied to other video standards. Previous or future video encryption as an effective encryption tool.
[0037] [0037] An overview of HEVC is described in G. J. Sullivan, J.-R. Ohm, W.-J. Han, T. Wiegand “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, #12. pages 1649 to 1668, December 2012. The latest draft HEVC specification is available at http://phenix.int- evry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC- N1003- v1.zip. The latest version of the Final Draft International Standard (FDIS) for HEVC is described in JCTVC-L1003_v34, available at http://phenix.it-sudparis.eu/jct/doc_end_user/documents/12_Geneva/wg11/JCTVC-L1003-v34 .zip
[0038] [0038] Other video encryption standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and H.264 Scalable Video Encryption (SVC) and Multi-View Video Encryption (MVC) extensions, as well as HEVC extensions such as range extension, multi-view extension (MV- HEVC) and Scalable Extension (SHVC). In April 2015, the Video Encoding Expert Group (VCEG) began a new research project that aims at a next generation of the video encryption standard. The reference software is called HM-KTA.
[0039] [0039] ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) are now studying the potential need to standardize future video encryption technology with a compression capability that significantly exceeds that of the current HEVC standard
[0040] [0040] The JVET first met between October 19 and 21, 2015. A JVET reference software version, including an algorithm description, is set out in the document Joint Exploration Model 5 (JEM 5), J. Chen , E. Alshina, GJ Sullivan, J.-R. Ohm, J. Boyce, “Algorithm Description of Joint Exploration Test Model 5 (JEM 5)”, JVET-E1001, January 2017. Another version of the JVET reference software is described in the document Joint Exploration Model 6 (Jem 6) , J. Chen, E. Alshina, GJ Sullivan, J.-R. Ohm, J. Boyce, “Algorithm description of Joint Exploration Test Model 6 (JEM 6)”, JVET-F1001, April 2017. Another version of the JVET reference software is described in the document Joint Exploration Model 7 (JEM 7) , J. Chen, E. Alshina, GJ Sullivan, J.-R. Ohm, J. Boyce, “Algorithm description of Joint Exploration Test Model 7 (JEM 7)”, JVET-G1001, July 2017.
[0041] [0041] Certain techniques in this disclosure may be described with reference to H.264 and/or HEVC to aid understanding, but the techniques described are not limited to H.264 or HEVC and may be used in combination with other encryption standards and other encryption tools.
[0042] [0042] The following discussion pertains to motion information. In general, a picture is divided into blocks, each of which can be predictively encrypted. Prediction of a current block can generally be performed using intraprediction techniques (which use data from the figure that includes the current block) or interprediction techniques (which use data from a previously encrypted figure against the figure that includes the block current). The interprediction can be unidirectional prediction or bidirectional prediction.
[0043] [0043] For each interdicted block, a set of motion information may be available. A motion information set can contain motion information for forward and/or backward prediction directions. In the present document, forward and backward prediction directions are two prediction directions of a bidirectional prediction mode. The terms "forward" and "backward" do not necessarily have a geometry meaning. Instead, the terms “forward” and “backward” generally correspond to the possibility that the reference figurations have to be displayed before (“backward”) or after (“forward”) the current figuration. In some examples, the “forward” and “backward” prediction directions may match the reference figuration list 0 (RefPicList0) and reference figuration list 1 (RefPicList1) of a current figuration. When only one reference figuration list is available for a figuration or slice, only RefPicList0 can be available and the motion information for each block of a slice can refer to a figuration of RefPicList0 (e.g. after the current figuration).
[0044] [0044] In some cases, a motion vector along with its reference index is used in a decoding process. Such a motion vector with its associated reference index is denoted as a unipredictive set of motion information.
[0045] [0045] For each prediction direction, the motion information contains a reference index and a motion vector. In some cases, for the sake of simplicity, a motion vector itself may be referred to in a way that the motion vector is assumed to have an associated reference index. A reference index can be used to identify a reference picture in the current reference picture list (for example, RefPicList0 or RefPicList1). A motion vector has a horizontal (x) and a vertical (y) component. In general, the horizontal component indicates a horizontal displacement within a reference figure, relative to the position of a current block in a current figure, used to locate an x-coordinate of a reference block, while the vertical component indicates a vertical displacement. within the reference figure, relative to the current block position, used to locate a reference block y coordinate.
[0046] [0046] Picture order count (POC) values are used in video encryption standards to identify a picture display order. While there are cases where two figures within an encryption video sequence may have the same POC value, this typically does not happen within a video encryption sequence. Therefore, POC values of figurations are generally unique and, therefore, can uniquely identify corresponding figurations. When multiple video encryption sequences are present in a bit stream, figures that have the same POC value may be closer together in terms of decoding order. The POC values of figurations are typically used for building reference figuration lists, deriving reference figuration sets as in HEVC, and motion vector scaling.
[0047] [0047] E. Alshina, A. Alshin, J.-H. Min, K. Choi, A. Saxena, M. Budagavi, “Known tools performance investigation for next generation video coding”, ITU - Telecommunications Standardization Sector, STUDY GROUP 16 Question 6, Video Coding Expert Group (VCEG) ), VCEG-AZ05, June 2015, Warsaw, Poland (hereinafter, “Alshina 1”), and A. Alshina, E. Alshina, T. Lee, “Bi-directional optical flow for improving motion compensation”, Picture Coding Symposium (PCS), Nagoya, Japan, 2010 (hereinafter, “Alshina 2”) described BIO. BIO is based on pixel-level optical flow. According to Alshina 1 and Alshina 2, BIO is only applied to blocks that have both forward and backward prediction. BIO as described in Alshina 1 and Alshina 2 is summarized below:
[0048] [0048] Given a pixel value It at time t, its first order Taylor expansion is
[0049] [0049] It0 is on the motion path of It. That is, the motion from It0 to It is considered in the formula.
[0050] [0050] Upon the assumption of optical flow: let (gradient), and equation (A) become
[0051] [0051] With respect to and like moving speed, Vx0 and Vy0 can be used to represent the same.
[0052] [0052] Then, equation (B) becomes
[0053] [0053] Suppose, as an example, a forward reference at t0 and a backward reference at t1, and
[0054] [0054] This leads to:
[0055] [0055] It is additionally assumed Vx0 = Vx1 = Vx and Vy0 = Vy1 = Vy since the motion is along the trajectory. Then, equation (D) becomes where ΔGx = Gx0 – Gx1, ΔGy = Gy0 – Gy1 can be calculated based on the reconstructed references. Since it is the regular biprediction, it is called BIO deviation henceforth for the sake of convenience.
[0056] [0056] Vx and Vy are derived at both the encoder and decoder minimizing the following distortion:
[0057] [0057] With Vx and Vy derived, the final block prediction is calculated with (E). Vx and Vy is called the “BIO move” for the sake of convenience.
[0058] [0058] In general, a video encryptor (eg, video encoder and/or video decoder) performs BIO during motion compensation. That is, after the video cryptographer determines a motion vector for a current block, the video cryptographer produces a predicted block for the current block using motion compensation against the motion vector. In general, the motion vector identifies the location of a reference block in relation to the current block in a reference figure. When performing BIO, a video cryptographer modifies the motion vector on a per-pixel basis for the current block. That is, instead of retrieving each pixel of the reference block as a unit of block, according to the BIO, the video cryptographer determines per-pixel modifications to the motion vector for the current block, and constructs the reference block from so that the reference block includes reference pixels identified by the motion vector and the per-pixel modification to the corresponding current block pixel. Then the BIO can be used to produce a more accurate reference block for the current block.
[0059] [0059] Figure 1 is a block diagram illustrating an exemplary video encoding and decoding system 10 that can utilize techniques for bidirectional optical streaming. As shown in Figure 1, the system 10 includes a source device 12 that provides encoded video data to be decoded at a later time via a destination device 14. In particular, the source device 12 provides the video data to the target device 14 via computer readable medium 16. The source device 12 and the target device 14 can be any of a wide range of devices, including desktop computers, notebook computers (i.e. i.e. laptop computers), tablet computers, set-top boxes, telephone headsets such as so-called “smart” phones, so-called “smart” pads, televisions, cameras, display devices, digital media players, video game, video streaming device or the like. In some cases, the source device 12 and the target device 14 may be equipped for wireless communication.
[0060] [0060] The destination device 14 may receive the encoded video data to be decoded via the computer readable medium 16. The computer readable medium 16 may be any type of medium or device capable of moving the encoded video data from the type from source 12 to target device 14. In one example, computer readable medium 16 may be a communication medium to enable source device 12 to transmit encoded video data directly to target device 14 in real time . The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the target device 14. The communication medium can be any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations or any other equipment that may be useful in facilitating communication from source device 12 to destination device 14.
[0061] [0061] In some examples, encoded data may be output from output interface 22 to a storage device. Similarly, encoded data can be accessed from the storage device via the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media such as a hard disk, Blu-ray Discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other media. of digital storage suitable for storing encoded video data. In a further example, the storage device may correspond to a file server or other intermediate storage device that can store the encoded video generated by the source device 12. The target device 14 may access stored video data from the source device 12. storage via streaming or download. The file server can be any type of server capable of storing encoded video data and transmitting that encoded video data to the target device
[0062] [0062] The techniques of this disclosure are not necessarily limited to wireless applications or configurations. The techniques can be applied to video encryption in connection with any of a variety of multimedia applications, such as broadcast television, cable television broadcasts, satellite television broadcasts, streaming video transmissions over the Internet, such as dynamic adaptive streaming (DASH), digital video that is encoded on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 10 may be configured to support unidirectional or bidirectional video transmission to support applications such as streaming video, video playback, video broadcasting, and/or video telephony.
[0063] [0063] In the example of Figure 1, source device 12 includes source video 18, video encoder 20 and an output interface 22. Destination device 14 includes input interface 28, video decoder 30 and output device 22. display 32. In accordance with this disclosure, the video encoder 20 of the source device 12 can be configured to apply the techniques for bidirectional optical streaming. In other examples, a source device and a target device may include other components or arrangements. For example, source device 12 may receive video data from an external video source 18, such as an external camera. Also, the target device 14 may interface with an external display device, rather than including an integrated display device.
[0064] [0064] The system 10 illustrated in Figure 1 is merely an example. Techniques for bidirectional optical streaming can be performed using any digital video encoding and/or decoding device. While, in general, the techniques of this disclosure are performed by a video encoding device, the techniques may also be performed by a video encoder/decoder, typically referred to as a "CODEC". Furthermore, the techniques of this disclosure can also be performed by means of a video processor. Source device 12 and target device 14 are merely examples of such encryption devices in which source device 12 generates encrypted video data for transmission to target device 14. In some examples, devices 12, 14 may operate in a substantially symmetrical manner so that each of the devices 12, 14 includes video encoding and decoding components. Therefore, the system 10 can support one-way or two-way video transmission between video devices 12, 14, for example, for video streaming, video playback, video broadcasting or video telephony. .
[0065] [0065] Source video 18 of source device 12 may include a video capture device such as a video camera, a video file containing previously captured video, and/or a video feed interface for receiving video from a video content provider. As a further alternative, source video 18 may generate computer graphics based data such as source video, or a combination of live video, archived video and computer generated video. In some cases, if the video source 18 is a video camera, the source device 12 and the target device 14 can form so-called camera phones or video phones. As mentioned above, the techniques described in this disclosure may be applicable to video encryption in general, and may apply to wireless and/or wired applications. In each case, the captured, pre-captured or computer generated video may be encoded by the video encoder 20. The encoded video information may then be output by the output interface 22 to a computer readable medium 16.
[0066] [0066] Computer readable medium 16 may include transient media such as a wireless broadcast or wired network transmission, or storage media (i.e. non-transient storage media) such as a hard disk, flash drive, disk compact disc, digital video disc, Blu-ray disc, or other computer readable media. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to target device 14, for example, via network transmission. Similarly, a computing device of a media production facility, such as a disc stamping facility, may receive encoded video data from source device 12 and produce a disc containing the encoded video data. Therefore, the computer readable medium 16 can be understood to include one or more computer readable media in various forms, in various examples.
[0067] [0067] Input interface 28 of destination device 14 receives information from computer readable medium 16. Information from computer readable medium 16 may include syntax information defined by video encoder 20, which is also used by video decoder 30, which includes syntax elements that describe features and/or processing of the video data. The display device 32 displays the decoded video data to a user, and may be any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma, an organic light-emitting diode (OLED) monitor, or another type of display device.
[0068] [0068] The video encoder 20 and video decoder 30 can operate in accordance with a video encryption standard, such as the HEVC standard introduced above, also referred to as ITU-T H.265. In some examples, the video encoder 20 and the video decoder 30 may operate in accordance with other proprietary and industrial standards, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, Part 10, Advanced Video Coding (AVC) or extensions of such standards. The techniques of this disclosure, however, are not limited to any specific encryption standard. Other examples of video encoding standards include MPEG-2 and ITU-T H.263. Although not shown in Figure 1, in some respects the video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include suitable MUX-DEMUX units, or other hardware. and software, to handle encoding both audio and video into a common bit stream or separate data streams. If applicable, MUX-DEMUX units can conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).
[0069] [0069] In HEVC and other video encryption specifications, a video sequence typically includes a series of figures. Figurations can also be referred to as “frames”. A figuration may include three sample arrangements, denoted SL, SCb, and SCr. SL is a two-dimensional array (ie a block) of luma samples. SCb is a bidirectional array of Cb chrominance samples. SCr is a two-dimensional array of Cr chrominance samples. Chroma samples may also be referred to herein as “chroma” samples. At other times, a figuration may be monochromatic and may only include an arrangement of luma samples.
[0070] [0070] To generate an encoded representation of a picture, the video encoder 20 may generate a set of cipher tree units (CTUs). Each of the CTUs can include a cipher tree block of luma samples, two corresponding cipher tree blocks of chroma samples, and syntax structures used to encrypt the cipher tree blocks samples. In monochrome figures or figures that have three separate color planes, a CTU can include a single cipher tree block and the syntax structures used to encode the cipher tree block samples. A cipher tree block can be a block of NxN samples. A CTU may also be referred to as a "tree block" or a "largest cryptographic unit" (LCU). HEVC CTUs can be broadly analogous to the macroblocks of other standards, such as H.264/AVC. However, a CTU is not necessarily limited to a specific size and may include one or more encryption units (CUs). A slice can include an integer number of CTUs ordered consecutively in a crawl scan order.
[0071] [0071] A CTB contains a quaternary tree whose nodes are encryption units. The size of a CTB can be ranges from 16x16 to 64x64 on the main HEVC profile (although technically CTB sizes of 8x8 can be supported). In some examples, an encryption unit (CU) could be the same size as a CTB and as small as 8x8. Each encryption unit can be encrypted with one mode. When a CU is intercrypted, the CU can be further partitioned into 2 or 4 prediction units (PUs) or become just a PU when the additional partition does not apply. When two PUs are present in a CU, the two PUs can each be, for example, rectangles that are half the size and two rectangles that are ¼ or ¾ the size of the CU.
[0072] [0072] To generate an encrypted CTU, the video encoder 20 can recursively quad-tree the cipher-tree blocks of a CTU to divide the cipher-tree blocks into cipher blocks, for this, the name “Cryptography Tree Units”. A cipher block can be a block of NxN samples. A CU may include a luma sample cipher block and two corresponding chroma sample cipher blocks from a picture that has a luma sample array, a Cb sample array, and a Cr sample array, and the structures of syntax used to encrypt the cipher block samples. In monochrome figures or figures that have three separate color planes, a CU can include a single cipher block and the syntax structures used to encode the cipher block samples.
[0073] [0073] The video encoder 20 can partition a CU's cipher block into one or more prediction blocks, also referred to as predictive blocks. A prediction block is a rectangular block of samples where the same prediction is applied. A prediction unit (PU) of a CU may include a luma sample prediction block, two corresponding chroma sample prediction blocks, and syntax structures used to predict the prediction blocks. In monochrome figures or figures that have three separate color planes, a PU can include a single prediction block and the syntax structures used to predict the prediction block samples. The video encoder 20 can generate predictive luma, Cb and Cr blocks for luma, Cb and Cr prediction blocks of each PU of the CU.
[0074] [0074] The video encoder 20 can use intraprediction or interprediction to generate the predictive blocks for a PU. If the video encoder 20 uses intraprediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on the decoded samples of the picture associated with the PU. When the video encoder 20 uses interprediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on the decoded samples of one or more pictures in addition to those associated with the PU. When the CU is intercrypted, a set of motion information can be present for each PU. In addition, each PU can be encrypted with a unique interprediction mode to derive the motion information set.
[0075] [0075] After the video encoder 20 generates predictive luma, Cb and Cr blocks for one or more PUs of a CU, the video encoder 20 can generate a residual luma block for the CU. Each sample in the CU's residual luma block indicates a difference between a luma sample in one of the CU's predictive luma blocks and a corresponding sample in the CU's original luma cipher block. Furthermore, the video encoder 20 can generate a residual block Cb for the CU. Each sample in the CU's residual Cb block can indicate a difference between a Cb sample in one of the CU's predictive Cb blocks and a corresponding sample in the CU's original Cb cipher block. The video encoder 20 can also generate a residual Cr block for the CU. Each sample in the residual Cr block of the
[0076] [0076] Furthermore, the video encoder 20 can use the quaternary tree partition to decompose the residual luma, Cb and Cr blocks of a CU into one or more luma, Cb and Cr transform blocks. A transform block is a rectangular block of samples on which the same transform is applied. A transform unit (TU) of a CU may include a transform block of luma samples, two corresponding transform blocks of chroma samples, and syntax structures used to transform the transform block samples. Thus, each TU of a CU can be associated with a Luma transform block, a Cb transform block and a Cr transform block. The luma transform block associated with the TU may be a sub-block of the residual luma block of the CU. The transform block of Cb may be a sub-block of the residual block Cb of the CU. The Cr transform block can be a sub-block of the CU residual Cr block. In monochromatic figures or figures that have three separate color planes, a TU can include a single transform block and the syntax structures used to transform the transform block samples.
[0077] [0077] The video encoder 20 may apply one or more transforms to a luma transform block of a TU to generate a luma coefficient block for the TU. A transform coefficient block can be a two-dimensional array of transform coefficients.
[0078] [0078] After generating a coefficient block (eg, a luma coefficient block, a Cb coefficient block, or a Cr coefficient block), the video encoder 20 can quantize the coefficient block. Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent transform coefficients by providing more compression. After the video encoder 20 quantizes a coefficient block, the video encoder 20 can entropy encode the syntax elements that indicate the quantized transform coefficients. For example, the video encoder 20 can perform Context Adaptive Binary Arithmetic Coding (CABAC) on the syntax elements that indicate the quantized transform coefficients.
[0079] [0079] The video encoder 20 can produce a bit stream that includes a sequence of bits that forms a representation of encrypted pictures and associated data. The bit stream can include a sequence of network abstraction layer (NAL) units. An NAL unit is a syntax structure that contains an indication of the type of data in the NAL unit and bytes that contain that data in the form of a raw byte sequence payload (RBSP) interspersed with emulation avoidance bits. Each of the NAL units includes an NAL unit header and encapsulates an RBSP. The NAL unit header may include a syntax element that indicates an NAL unit type code. The NAL unit type code specified by the NAL unit header of an NAL unit indicates the type of the NAL unit. An RBSP can be a syntax structure that contains an integer number of bytes that is encapsulated in a NAL unit. In some instances, an RBSP includes zero bits.
[0080] [0080] Different types of NAL units can encapsulate different types of RBSPs. For example, a first type of NAL unit may encapsulate an RBSP for a PPS, a second type of NAL unit may encapsulate an RBSP for an encrypted slice, a third type of NAL unit may encapsulate an RBSP for SEI, and so on. on. NAL units that encapsulate RBSPs for the video encryption data (as opposed to RBSPs for SEI parameter sets and messages) may be referred to as VCL NAL units.
[0081] [0081] Video decoder 30 can receive a bit stream generated by video encoder
[0082] [0082] In accordance with the techniques of this disclosure, the video encoder 20 and/or the video decoder 30 can additionally perform BIO techniques during motion compensation as discussed in more detail below.
[0083] [0083] Video encoder 20 and video decoder 30 may each be deployed as any of a variety of suitable encoder or decoder circuitry as applicable, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic circuitry, software, hardware, firmware, or any combination thereof. Each of the video encoder 20 and the video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined video encoder/decoder (CODEC). A device that includes video encoder 20 and/or video decoder 30 may include an integrated circuit, a microprocessor, and/or a wireless communication device, such as a cell phone.
[0084] [0084] Figure 2 is a conceptual diagram illustrating an example of one-sided motion estimation (ME) with a block compatibility algorithm (BMA) performed for motion compensated frame rate upconversion (MC-FRUC). In general, a video cryptographer (such as video encoder 20 or video decoder 30) performs one-way ME to obtain motion vectors (MVs), such as MV 112, by searching for the best compatible block (e.g. reference block 108 ) from reference frame 102 to current block 106 of current frame 100. Then, the video cryptographer interpolates an interpolated block 110 along the motion path of motion vector 112 in interpolated frame 104. That is, in the example of Figure 2, motion vector 112 passes through intermediate points of current block 106, reference block 108, and interpolated block 110.
[0085] [0085] As shown in Figure 2, three blocks in three frames are involved following the motion path. Although the current block 106 in the current frame 100 belongs to an encrypted block, the best compatible block in the reference frame 102 (i.e., reference block 108) does not fully belong to an encrypted block (i.e., the best compatible block may not fall into an encrypted block boundary, but can override such a boundary instead). Likewise, the interpolated block 110 in the interpolated frame 104 need not fully belong to an encrypted block. Consequently, overlapping regions of blocks and unfilled regions (holes) may occur in the interpolated frame 104.
[0086] [0086] To deal with overlaps, simple FRUC algorithms may simply involve weighting and replacing the overlapping pixels. Furthermore, the holes can be covered by the pixel values of a reference frame or the current one. However, these algorithms can result in blocking and blurring artifacts. Therefore, motion field segmentation, successive extrapolation using discrete Hartley transform, and image restorative painting can be used to address holes and overlaps without increasing blocking and blurring artifacts.
[0087] [0087] Figure 3 is a conceptual diagram illustrating an example of bilateral ME as a BMA performed for MC-FRUC. Bilateral ME is another solution (in MC-FRUC) that can be used to avoid the problems caused by overlaps and holes. A video cryptographer (such as video encoder 20 and/or video decoder 30) performing two-way ME obtains MVs 132, 134 that pass through interpolated block 130 of interpolated frame 124 (which is intermediate to current frame 120 and frame of reference 122) using temporal symmetry between current block 126 of current frame 120 and reference block 128 of reference frame 122. As a result, the video cryptographer does not generate overlaps and holes in the interpolated frame
[0088] [0088] In the HEVC standard, there are two interprediction modes, called merge mode (with skip mode considered as a special merge case) and advanced motion vector prediction (AMVP) mode, respectively, for a PU. In merge mode, a video encoder and video decoder generate the same list of spatial, temporal, and artificially generated motion vector candidates, with each candidate including a motion vector and a reference figure index. For an interpredicted block encrypted in merge mode, the video encoder includes a candidate index used to predict the block, and the video decoder decodes the block using the motion vector and reference figure index associated with the identified candidate. by the index. In AMVP mode, a video encoder and a video decoder generate the same list of motion vector candidates, with each candidate including only one motion vector. For an intercepted block encrypted in AMVP mode, the video encoder includes a candidate index used to predict the block, a motion vector difference, and a reference figure index, and the video decoder decodes the block with the use of the motion vector associated with the candidate identified by the index as a motion vector predictor. That is, the video decoder uses the motion vector predictor plus the motion vector difference to determine a motion vector for the block.
[0089] [0089] Figure 4 shows an example of optical flow path for BIO. In the example of Figure 4, the figure of B 180 is a bidirectional interpredicted figure that is being predicted using the reference figure 182 (Ref0) and reference figure 184 (Ref1). BIO uses pixel-direction motion refinement which is performed on top of block-direction motion compensation in the case of biprediction. As BIO compensates for fine motion within the block, enabling BIO to potentially result in block size enlargement for motion compensation. Sample-level motion refinement does not require exhaustive research or signaling by using an explicit equation to provide the fine motion vector for each sample.
[0090] [0090] I(k) represents a luminance value of reference k (k=0, 1) after motion compensation is performed for a bipredicate block. and are horizontal and vertical components of the gradient, respectively. Assuming that the optical flux is valid, the motion vector field (vx, vy) is given by the following equation:
[0091] [0091] Combining the optical flow equation with the Hermite interpolation for the motion path of each sample yields a unique third-order polynomial that is compatible with both I(k) and derivative function values at the ends. The value of this polynomial at t=0 is the prediction of BIO:
[0092] [0092] In equation (2), τ0 and τ1 correspond to the distance to the reference frames as shown in Figure 4. The distances τ0 and τ1 are calculated based on the POC values for Ref0 and Ref1: τ0=POC(current) –POC(Ref0), τ1= POC(Ref1)– POC(current). If both predictions come from the same time direction (both past and future) then the signs are different τ0. τ1 <
[0093] [0093] The motion vector field (vx, vy) is determined by minimizing the difference Δ between values at points A and B, which corresponds to the intersection of the motion trajectory and reference frame planes in Figure 4. This intersection is shown as point 186 in Figure 4. One model uses only the first linear term of local Taylor expansion for Δ:
[0094] [0094] All values in equation (1) depend on the sample location (i', j'), which has been omitted so far. Assuming the motion is consistent at a surrounding location, the Δ within the square window Ω of (2M+1)x(2M+1) centered on the currently predicted point (i, j) can be minimized:
[0095] [0095] For this optimization problem, a simplified solution that does first minimization in vertical and then in horizontal directions can be used, which results in: where,
[0096] [0096] In order to avoid division by zero or too small a value, the regularization parameters r and m are introduced in equations (2), (3). r = 500 · 4d–8 (8) m = 700 · 4d–8 (9) At present, d is the internal bit depth of the input video.
[0097] [0097] In some cases, BIO MV refining may be unreliable due to noise or erratic motion. Therefore, in BIO, the magnitude of the MV regime is bound to a certain BIO limit. The threshold value is determined based on the possibility that all the reference figures of the current figure are all in one direction. If all the reference figurations of the current figurations of the current figuration are from one direction, the threshold value is set to 12 × 214–d, otherwise it is set to 12 × 213–d.
[0098] [0098] Gradients for BIO can be calculated at the same time with motion compensation interpolation using operations consistent with HEVC motion compensation process (2D separable FIR). The input to this 2D breakable FIR is the same reference frame sample as for the fractional motion and position compensation process (fracX, fracY) according to the fractional part of the block motion vector. For the horizontal gradient the signal is first interpolated vertically using the BIOfilters that correspond to the fractional position fracY with d-8 descaling deviation and then the gradient BIOfilterG filter is applied in a horizontal direction that corresponds to the fractional position fracX with a descaling offset by 18-d. For the vertical gradient, the gradient filter is first applied vertically using BIOfilterG which corresponds to the fractional position fracY with d-8 descaling deviation, and then the signal shift is performed using BIOfilterS in a horizontal direction corresponding to fractional fracX position with 18-d descaling deviation. The length of the interpolation filter for calculating BIOfilterG gradients and BIOfilterF signal shifting can be shorter (6 leads) in order to maintain reasonable complexity. Table 1 shows the filters that can be used to calculate gradients for different fractional positions of block motion vector in BIO. Table 2 shows the interpolation filters that can be used for BIO prediction signal generation.
[0099] [0099] Figure 5 shows an example of gradient calculation for an 8x4 block (shown as current block 190 in Figure 5). For the 8x4 block, a video cryptographer fetches the motion-compensated predictors (also referred to as MC predictors) and calculates the H/R gradients of the pixels in the current block 190 as well as the two outer pixel lines due to the fact that the solution of vx and vy for each pixel uses the HOR/VER gradient values and motion-compensated predictors of the pixels in the Ω window centered on each pixel, as shown in equation (4). In JEM, for example, the size of this window is set to 5x5, which means that a video cryptographer looks for the motion-compensated predictors and calculates the gradients for the two outer pixel lines. Window 192 represents the 5x5 window centered on pixel A, and window 194 represents the 5x5 window centered on pixel B.
[0100] [0100] In JEM, for example, BIO is applied to all bidirectional predicted blocks when the two predictions are from different reference figures. When local lighting compensation (LIC) is enabled for a CU, BIO is disabled.
[0101] [0101] At the 5th JVET meeting, a proposal by JVET-E0028, A. Alshin, E.Alshina, “EE3: bi-directional optical flow w/o block extension”, JVET-E0028, January 2017, was sent to modify BIO operations and reduce memory access bandwidth. In this proposal, no MC and gradient value predictors are needed for pixels outside the current block. Furthermore, the solution of vx and vy for each pixel is modified using the MC predictors and the gradient values of all pixels in the current block as shown in Figure
[0102] [0102] Figure 6 shows an example of BIO modified for an 8x4 block (shown as actual block 200) according to the techniques proposed in JVET-E0028. A simplified version of JVET-E0028 has been proposed to address the issue of incompatibility in results between block-level and sub-block-level BIO processes. Instead of using the Ω neighborhood with all pixels in a CU, the proposed method modifies the Ω neighborhood to include only 5x5 pixels centered on the current pixel without any interpolation or gradient calculation for pixel locations outside the current CU.
[0103] [0103] Video encoder 20 and video decoder 30 can also perform Overlap Block Motion Compensation (OBMC). The following description refers to OBMC as currently implemented in JEM, but video encoder 20 and video decoder 30 can also perform other types of OBMC. OBMC was used for earlier generations of video standards, for example as in H.263. In JEM, OBMC is performed for all Compensated Motion (MC) block boundaries except the right and bottom boundaries of a CU. Furthermore, OBMC is applied to both luma and chroma components. In JEM, a block of MC corresponds to a block of encryption. When a CU is encrypted with sub-CU mode (including sub-CU, Affine and FRUC merge mode), each sub-block of the CU is an MC block. To process CU boundaries in a uniform manner, OBMC is performed at the sub-block level for all MC block boundaries, where the sub-block size is set equal to 4x4, as illustrated in Figures 7A and 7B .
[0104] [0104] When OBMC applies to the current sub-block, in addition to the current motion vectors, the motion vectors of four neighboring connected sub-blocks, if available and not identical to the current motion vector, are also used to derive a predictive block to the current sub-block. These multiple prediction blocks based on multiple motion vectors are combined to generate the final prediction signal of the current sub-block.
[0105] [0105] In the following examples, a predictive block based on the motion vectors of a neighboring sub-block is denoted as PN, where N indicates an index for the neighboring sub-blocks above, below, left and right, and block of prediction based on the motion vectors of the current sub-block is denoted as PC. When PN is based on the motion information of a neighboring sub-block that contains the same motion information as a current sub-block, then OBMC is not performed for PN. Otherwise, every PN pixel is added to the copixel in PC, that is, four rows/columns of PN are added to PC. The weighting factors {1/4, 1/8, 1/16, 1/32} are used for PN and the weighting factors {3/4, 7/8, 15/16, 31/32} are used for PRAÇA. The exception is small MC blocks, (i.e. when the cipher block height or width is equal to 4 or a CU is encrypted with sub-CU mode), to which only two PN rows/columns are added to PC. In this case, weighting factors {1/4, 1/8} are used for PN and weighting factors {3/4, 7/8} are used for PC. For a PN generated based on the motion vectors of a vertically neighboring sub-block, pixels in the same PN row are added to PC with the same weighting factor. For a PN generated based on the motion vectors of a horizontally neighboring sub-block, pixels in the same PN column are added to PC with the same weighting factor. Note that BIO can also be applied to the derivation of the final prediction block.
[0106] [0106] Figure 7A shows inter CU 210, which includes 4x4 sub-blocks. For the current sub-block 212, MVs from the left neighbor sub-block 214 and above-neighbor sub-block 216 are used in performing OBMC for the current sub-block
[0107] [0107] Figure 7B shows inter CU 230, which includes 4x4 sub-blocks. For current subblock 222, MVs from left neighbor subblock 224, above neighbor subblock 216, below neighbor subblock 228, and right neighbor subblock 230 are used in performing OBMC for current subblock 212 .
[0108] [0108] Figures 8A to 8D illustrate a process for determining a predictive block for the current sub-block 212 of Figure 7B. In the example of Figure 8A, the OBMC prediction of current sub-block 222 is equal to a weighted average of the predictive sub-block determined using MV from the neighboring block above 224 and the predictive sub-block determined for the sub-block current using the MV of the current sub-block. In the example of Figure 8B, the OBMC prediction of current sub-block 222 is equal to a weighted average of the predictive sub-block determined using MV from the left neighbor block 226 and the predictive sub-block determined for the sub-block current using the MV of the current sub-block. In the example of Figure 8C, the OBMC prediction of current sub-block 222 is equal to a weighted average of the predictive sub-block determined using MV of the neighboring block below 228 and the predictive sub-block determined for the sub-block current with the use of
[0109] [0109] In JEM, for example, for a CU with size less than or equal to 256 luma samples, a CU level marker is flagged to indicate whether or not the OBMC is applied to the current CU. For CUs with size greater than 256 luma samples or unencrypted with AMVP mode, OBMC is applied by default. In the video encoder 20, when the OBMC is applied to a CU, its impact is taken into account during the motion estimation stage. The prediction signal using motion information from the top neighbor block and the left neighbor block is used to compensate for the top and left boundaries of the original signal from the current CU, and then the normal motion estimation process is applied. .
[0110] [0110] BIO can be considered as CU-level or regular sub-block-level MC post-processing. While existing BIO implementations offer some cryptographic performance improvements, existing implementations also present complexity issues for both software and hardware designs.
[0111] [0111] Figure 9 shows a flow diagram of a BIO project. In the example of Figure 9, a video cryptographer performs a bi-predictive motion compensation process (MC 240) to determine a predictive block (P0/P1) using two motion vectors (MV0 and MV1) and two reference figures. (Ref0 and Ref1). P0 represents the predictive block generated by MV0, which points to Ref0 in L0. P1 represents the predictive block generated by MV1, which points to Ref1 in L1. The final predictive block of the bi-prediction motion compensation process can, for example, be an average or weighted average of P0 and P1. The video cryptographer performs a BIO process (BIO 242) for the predictive block to determine a BIO refined predictive block (P). The video cryptographer applies an OBMC process (OBMC 244) to determine a motion-compensated predictive block (Ρ0'/P1'). The video cryptographer applies a second BIO process (BIO 246) to generate a final predictive block (P”).
[0112] [0112] In the example of Figure 9, bipredictive motion compensation is followed by BIO filtering for both regular MC and OBMC, and therefore, BIO processes are called multiple times for the same sub-block. This lengthens the overall motion compensation process as it can use extra bandwidth introduced by the BIO on top of the OBMC. Existing BIO implementations use division operations to calculate fine-grained motion vectors, and pixel-based division operations are expensive in hardware design due to the fact that typically multiple copies of dividers are required to achieve sufficient throughput, resulting in high demand for the silicon area. Regarding motion estimation, BIO is a process of refining MV in a small range of motion research. Existing BIO implementations update MC predictors as a result. However, the motion vectors stored in the MV staging area are not updated accordingly after refining, causing an asynchronous design between the MC predictors and the associated motion vectors. Motion vector refinement calculation currently employs 6-lead interpolation filters and gradient filter, which results in increased complexity.
[0113] [0113] This disclosure describes techniques that can address the issues described above in relation to known BIO implementations. The following techniques can be applied individually, or alternatively, in any combination.
[0114] [0114] According to one of the techniques of this revelation, a block-based BIO can be designed so that a group of pixels is used to generate a single motion vector refinement for all pixels in the group. The block size can be a predefined size including, but not limited to, 2x2 and 4x4.
[0115] [0115] Block size can be adaptively selected. For example, it can be based on the resolution of the frame that is encrypted, the size of the entire CU, the temporal layer of the current picture, a quantization factor (QP) used to encrypt the current picture, and/or the mode of current CU encryption.
[0116] [0116] Equation (4) above is solved for a square window Ω, which includes the block itself and a neighborhood of the block that is considered. In one example, the size of Ω is 8x8 where the central 4x4 region contains the group of pixels under consideration to calculate BIO offsets and the surrounding 2-pixel region is the neighborhood of the block.
[0117] [0117] A weighting function, which can take, including but not limited to, the form of Equation (10) above, can be used to provide different weights for pixels at different locations within the window. In one example, pixels that are in the central part of Ω are assigned greater weights than pixels that are around the boundary of Ω. A weighted average can be used to calculate the average value of the terms in Eq. (7) in order to solve for vx and vy for the entire block. In some examples, a median filter can be applied to exclude outliers in the block before calculating the weighted average to obtain a more stable solution to equation 4.
[0118] [0118] Additionally, if information indicating that a pixel belongs to an occluded object between Ref0 and Ref1 is available, then the neighboring pixel belonging to the occluded object can be assigned lighter weights. In one example, pixels belonging to occluded objects can be assigned a weight of 0, while for other pixels the weights remain unchanged. This allows pixel level control on whether a specific pixel location is involved with the BIO derivation.
[0119] [0119] Neighborhood range for BIO can be preset. In some examples, reach may be signaled via an SPS, PPS, slice header, or other such data structure. In some examples, the range may be made adaptive based on encryption information including, but not limited to, BIO block size, CU size, or frame resolution.
[0120] [0120] According to another technique of this revelation, the motion vector of a block can be adapted after the BIO motion refinement. In this process, the motion vector (or motion field) of a block is refined by adding the motion information offsets derived from the BIO. Update can take place after regular MC process of current block and refining MV of CU/current block before OBMC to CU/subsequent block so that updated MV is involved in OBMC operation of CU/subsequent blocks . In some examples, the update may occur after the OBMC for the subsequent CUs, so that the updated motion vector is only used for motion vector prediction. The updated MV update can, for example, be used for any or any combination of AMVP mode, merge mode and FRUC mode.
[0121] [0121] In some BIO implementations, the gradient of the fractional sample position is determined based on the integer samples of the reference figures and performing interpolation processes in the horizontal and/or vertical directions. To simplify the gradient calculation process, the gradient can be calculated based on the prediction samples that have already been interpolated based on the existing MV of the current block/CU. Gradient calculation can be applied to prediction samples at different stages during prediction sample generation. For example, to generate the prediction samples for a biprediction block, a video cryptographer first generates L0 prediction samples and L1 prediction samples and then applies a weighted average to the L0 and L1 prediction samples to generate the biprediction samples. When OBMC is enabled, the generated biprediction samples are additionally weighted averages with the prediction samples using the neighboring MVs to generate the final prediction samples. In this example, gradient calculation can be applied to prediction samples L0, L1 independently; or the gradient calculation can be applied only to the biprediction samples and the final prediction samples with the assumption that the L0 and L1 predictors have the same gradient values. That is, instead of calculating the gradient values separately using Ref0/Ref1 and summing them during the derivation of BIO motion/deviations vectors, the gradient calculation on the biprediction samples can get the gradient values summed in a single step.
[0122] [0122] In a deployment, a 2-lead gradient filter is applied to the prediction samples to calculate the gradients. The current pixel position in a block is left to be (x, y) and the MC predictor at that location is denoted by P(x, y). The gradient value can be calculated by: where K and S are scaling factors that can be preset values, W denotes the block width, and H denotes the block height. Note that the location (x, y) can be at any location of fractional pel after interpolation. In an example, the values can be (24, 12, 8) or (26, 13, 8). These values can be flagged in an SPS, PPS, slice header, or other data structure.
[0123] [0123] In one example, a longer-lead gradient filter can be applied to the prediction samples to calculate the gradients. For example, filter with coefficients {8, -39, -3, 46, -17, 5} can be applied. In some examples, the filter with filter coefficients {1, -5, 0, 5, -1}, or another symmetric filter is used. In some examples, the filter with coefficients {10, -44, 0, 44, -10, 0} is used.
[0124] [0124] According to another technique of this disclosure, the BIO process in the OBMC can be entirely or conditionally removed. BIO can use reference samples to generate the deviation, or it can use MC/OBMC predictors to generate the deviation. The generated BIO deviation is added to MC predictors or OBMC predictors as motion vector refinement.
[0125] [0125] Figure 10 shows a flow diagram of a simplified BIO according to the techniques of this disclosure. In the example of Figure 10, a video cryptographer performs a bi-predictive motion compensation process (MC 270) to determine a predictive block (P0/P1) using two motion vectors (MV0 and MV1) and two reference figures. (Ref0 and Ref1). The video cryptographer performs a BIO process (BIO 272) on the predictive block to determine a refined predictive block by
[0126] [0126] Figure 10 shows an example of BIO derived from Ref0/Ref1 and applied to MC P0/P1 predictors. In the example of Figure 10, the BIO process in OBMC, eg BIO 246 in Figure 9, is removed. BIO deviations are derived from MV0/MV1, Ref0/Ref1 and MC predictor P0/P1, and deviations are added to P0/P1 during the Bi-average. The P' predictor is the final predictor of the overall MC process. Dotted lines indicate motion vector information in the Figure and solid lines indicate current pixel data for either prediction or reference samples. In Figure 10, the BIO operation following the MC uses the MC predictors P0/P1 together with the gradient values derived from Ref0/Ref1 using motion vectors MV0/MV1 to calculate motion vector refinement and deviations. The BIO output (P) is generated by a bi-average of P0/P1 plus the BIO offsets on a per-pixel basis (even with block-level BIO where the motion vector refinement remains the same within the block , the BIO offset may still be on a per-pixel basis since the gradient values for each pixel may be different).
[0127] [0127] Figure 11 shows a flow diagram of a simplified BIO implementation according to the techniques of this disclosure. In the example of Figure 11, a video cryptographer performs a bipredictive motion compensation process (MC 280) to determine a predictive block (P) using two motion vectors (MV0 and
[0128] [0128] Figure 11 shows an example of BIO derived from Ref0/Ref1 and applied to OBMC P0'/P1' predictors. The BIO deviations are derived from MV0/MV1, Ref0/Ref1 and OBMC predictors P0'/P1', and the deviations are added to P0'/P1' during the Bi-average. The P” predictor is the final predictor of the overall MC process.
[0129] [0129] Figure 12 shows a flow diagram of a simplified BIO according to the techniques of this disclosure. In the example of Figure 12, a video cryptographer performs a bi-predictive motion compensation process (MC 290) to determine a predictive block (P0/P1) using two motion vectors (MV0 and MV1) and two reference figures. (Ref0 and Ref1). The video cryptographer performs a BIO process (BIO 292) on the predictive block to determine a BIO refined predictive block and in parallel performs an OBMC process (OBMC 294) on the predictive block to determine a motion compensated predictive block ( Ρ'). The video cryptographer sums (296) the BIO refined predictive block and the motion compensated predictive block to determine a final predictive block (P”).
[0130] [0130] Figure 12 shows an example of BIO derived from/applied to MC P0/P1 predictors. Gradient values are calculated using MV0/MV1 and Ref0/Ref1 and then used to generate BIO deviations together with MC predictor P0/P1. The deviations are added to the MC predictor P' to generate the final predictor P" of the overall MC process.
[0131] [0131] Figure 13 shows a flow diagram of a simplified BIO according to the techniques of this disclosure. In the example of Figure 13, a video cryptographer performs a bi-predictive motion compensation process (MC 300) to determine a predictive block (P0/P1) using two motion vectors (MV0 and MV1) and two reference figures. (Ref0 and Ref1). The video cryptographer performs a BIO process (BIO 302) on the predictive block to determine a BIO-refined predictive block (P'). The video cryptographer applies an OBMC process (OBMC 304) on the BIO-refined predictive block to determine a final predictive block (Ρ”).
[0132] [0132] Figure 13 shows an example of BIO derived from/applied to MC P0/P1 predictors. BIO deviations are calculated using the MC predictors P0/P1, and deviations are added to P0/P1 during Bi-average, followed by an OBMC process to generate the final predictor P” of the overall MC process .
[0133] [0133] Figure 14 shows a flow diagram of a simplified BIO implementation according to the techniques of this disclosure. In the example of Figure 14, a video cryptographer performs a bi-predictive motion compensation process (MC 310) to determine a predictive block (P) using two motion vectors (MV0 and MV1) and two reference figures (Ref0). and Ref1). The video cryptographer applies an OBMC process (OBMC 312) to the predicted block to determine a motion compensated (Ρ0'/P1') predictive block. The video cryptographer applies a BIO process (BIO 314) to the motion compensated predictive block to determine a final predictive block (P”).
[0134] [0134] Figure 14 shows a simplified BIO example using only one MC predictor. The gradient values are derived using the OBMC P0'/P1' predictors and the MV0/MV1 motion vectors, and the BIO deviations are calculated using the OBMC P0'/P1' predictors. Deviations are added to P0'/P1' during the Bi-average to generate the final P" predictor of the overall MC process.
[0135] [0135] In one example, BIO in OBMC can be conditionally disabled. MVCURX and MVNBRX are allowed to be the motion vectors of the current block and the neighboring block for Listx (where x is 0 or 1) during the OBMC process. In an example, if the absolute value of the motion vector difference between MVCUR0 and MVNBR0, and the absolute value of the motion vector difference between MVCUR1 and MVNBR1 are both less than a threshold, the BIO in OBMC can be disabled. The threshold may be signaled in an SPS, PPS, slice header, or other such data structure, or a predefined value (e.g., half a pixel, a pixel, or any value that is equal to the search range of the vector refinement). BIO movement) can be used. In another example, if the absolute value of the motion vector difference between MVNBR0 and MVNBR1 is less than a threshold, BIO in OBMC can be disabled.
[0136] [0136] In one example, the number of BIO operations in the general MC process is terminated with a predetermined value. For example, the BIO process is performed at most N times (eg N can be 1 or any positive integer) for each block (block can be CTU, CU, PU or a block of MxN). In one example, BIO is only allowed to be performed once for each block. When prediction samples are generated using current motion information with BIO applied, no additional BIO is allowed for generating the other prediction samples for the current block such as OBMC or any other methods to refine prediction samples. However, when prediction samples are generated using current motion information with no BIO applied, at most one BIO is allowed for generations of the other prediction samples for the current block such as OBMC or any other method to refine the samples. of prediction.
[0137] [0137] According to techniques of this disclosure, a BIO block based design is proposed. Instead of pixel-level motion refining (eg as in JEM5), motion refining is done on a 4x4 block basis. In block-based BIO the weighted sum of gradients for the samples in a 4x4 block is used to derive motion vector deviations from BIO for the block.
[0138] [0138] The other processes such as gradient calculation, BIO motion vectors and deviation can, for example, follow the same procedure as that done in several iterations of JEM. After the 4x4 VM for each VM is obtained with block-based BIO, the VM staging area is updated and used for subsequent CU encryption. The general block diagram is shown in Figure 15, where OMBC is applied without BIO operation.
[0139] [0139] Figure 15 shows an example of application of BIO according to the techniques of this disclosure. In the example of Figure 15, a video cryptographer performs a bi-predictive motion compensation process (MC 320) to determine a predictive block (P0/P1) using two motion vectors (MV0 and MV1) and two reference figures. (Ref0 and Ref1). The video cryptographer performs a BIO process (BIO 322) on the predictive block to determine a BIO refined predictive block (P). The video cryptographer applies an OBMC process (OBMC 304) on the BIO-refined predictive block to determine a final predictive block (Ρ').
[0140] [0140] This portion of the disclosure will now describe several simplified architectures that can be deployed in combination with the techniques described above. Among these architectures, this disclosure describes, for purposes of exemplification only, the scenario when a simplified gradient filter is used to derive gradient values based on the interpolated samples. That is, reference samples are not directly needed during BIO drift derivation. Instead, regular prediction samples are generated followed by the gradient calculation.
[0141] [0141] In some of the techniques described above, due to the pixel span required to calculate the intermediate values for BIO deviations, for a WxH size block and N-lead interpolation filter (e.g. 8-lead used in HEVC and JEM existing), the number of reference samples needed is (W+N-1+4)x(H+N-1+4), assuming the span is by 2 pixels. This increases the bandwidth requirement compared to existing MC interpolation where the number of reference samples needed is (W+N-1)x(H+N-1).
[0142] [0142] In some of the techniques described above, the synchronization of motion information imposes a dependency issue as motion vectors change during the motion compensation process. This can create difficulty for some deployments where latency is critical and by changing the motion vector during the MC process, latency reduction techniques such as prefetched reference data may not perform effectively. Furthermore, the existing BIO additional interpolation filter can introduce additional computational and storage complexity. To address these issues, this revelation introduces several techniques. The following techniques can be deployed individually or in any combination.
[0143] [0143] In some of the techniques described above, due to the sample length required to calculate the intermediate values for BIO deviations, the number of reference samples required is increased when compared to the existing MC interpolation process. In this disclosure, several examples are described so that the number of reference samples that are used to derive the BIO offsets are limited to the same set of samples used in the regular interpolation process. For example, when an N-lead MC interpolation filter is used, the number of reference samples required is limited to (W+N-1)x(H+N-1). This can be obtained in several ways.
[0144] [0144] Figure 16 shows an illustration of pixel space 328, which includes the groups of pixels used to apply a typical BIO process. In the example of Figure 16 , pixel space 328 includes a first group of pixels representing pixels for which BIO is being performed. Pixel space 328 also includes a second group of pixels which is used to BIO the first group of pixels. For BIO, interpolation filtering is performed for the second group of pixels. Performing interpolation filtering for the second group of pixels includes additional pixels. These additional pixels are shown in pixel space 328 as a third group of pixels and a fourth group of pixels. Therefore, in order to perform BIO on the first group of pixels, the second, third and fourth groups of pixels need to be stored and fetched from memory. As will be explained in more detail below, the techniques of this revelation can reduce the size of the pixel space needed to perform the BIO, which can allow the BIO to be performed without storing or fetching the fourth group of pixels.
[0145] [0145] According to a technique of this disclosure, a shorter derived interpolation filter can be used to utilize the limited reference samples. This includes, but is not limited to, the bilinear filter (2-lead) or the HEVC-chroma filter (4-lead). If the filter length is N for luma interpolation and the extent size is S, any interpolation filter with filter length less than or equal to N – 2S can satisfy the same memory requirement. The shortest derived interpolation filter can be applied only to the extended regions for the BIO offset calculation. The normal MC interpolation filter is still applied to generate the MC emitted from the current block.
[0146] [0146] In accordance with a technique of this disclosure, the reference sample repetition can be exploited to extend the samples in the delineation of the reference sample block to the places where additional samples are needed for the calculation of intermediate values for deviations of BIO, as illustrated in Figure 16. The amount of sample repetition can vary depending on the size of the extended region for the BIO calculation. For example, when the N-lead MC interpolation filter is used, a reference sample block with size (W+N-1)x(H+N-1) is first fetched. Then, sample repetition is applied to all boundaries of the reference block to generate an extended reference block with a size of (W+N-1+2S)x(H+N-1+2S), where S is the extent size for BIO offset calculation. Then the BIO and MC process as described above is applied to the extended reference block.
[0147] [0147] In one example, horizontal repeat is applied first, then followed by vertical repeat. In some examples, vertical repeat is applied first, then followed by horizontal repeat. THE
[0148] [0148] Due to the reference sample repetition that occurs in a CU boundary, the memory-constrained BIO techniques described above to limit the number of reference samples used to derive BIO offsets may yield different results for block sizes that have the same movement information. For example, when performing memory-constrained BIO for the same set of samples, but with different processing sizes (for example, for one 2NxM block or two NxM blocks, individually) with the same motion information, the samples generated predictions may be different. That is, the delimitation of the processing unit affects the BIO results. To deal with this, in one example, when deriving the BIO offset, the gradient calculation and interpolation for the BIO offset always takes place on a block basis of NxM. That is, for BIO processing, the processing unit is limited to NxM. When M and N are equal to the minimum block size in which MC occurs, any CU can be partitioned into multiple integers of such subblocks. When MxN is larger than the minimum block size, in one example, the CUs are still partitioned into multiple integers of MxN while for the remaining part of the CU, the minimum block size must be used for BIO processing. In some examples, the remaining part of the CU may use boundaries which are the union of both the block degree of MxN and the boundary of true CU. The values of N and M (which can both equal 4 (for example, in JEM deployments)) can be predefined or flagged in a Video Parameter Set (VPS), SPS, PPS, Slice Header, CTU, or CU. This provides an alternative deployment option for the motion compensation process and creates consistency across multiple block sizes for motion compensation.
[0149] [0149] In one example, motion information synchronization can be used by motion compensation process only. In one example, synchronization occurs after the regular MC process and before the OBMC process, but the synchronization does not affect the motion prediction of subsequent blocks. That is, synchronization does not update the contents of the motion information staging area. This provides more flexibility when processing blocks in parallel. The motion vectors used in this motion compensation process for a block and the motion vectors stored for that block can be different.
[0150] [0150] In one example, the BIO process that uses the reference samples to generate deviations and the associated motion vector values are only used during the OBMC process, but do not propagate for MV prediction or merge candidates for the following CUs,
[0151] [0151] Figure 17 shows an illustration of BIO derived from Ref0/Ref1 and applied to MC P0/P1 predictors. In the example of Figure 17, a video cryptographer performs a bi-predictive motion compensation process (MC 330) to determine a predictive block (P0/P1) using two motion vectors (MV0 and MV1) and two reference figures. (Ref0 and Ref1). The video cryptographer performs a BIO process (BIO 332) on the predictive block to determine a BIO refined predictive block (P). The video cryptographer applies an OBMC process (OBMC 334) on the BIO-refined predictive block to determine a final predictive block (Ρ').
[0152] [0152] In the example of Figure 17, motion vectors derived from BIO (B-MV) are used only by the OBMC process. In some examples, when a BIO process is called during the OBMC process, as shown in this example, the VMs do not need to be updated.
[0153] [0153] Figure 18 shows an illustration of BIO derived from/applied to MC P0/P1 predictors. In the example of Figure 18, a video cryptographer performs a bi-predictive motion compensation process (MC 340) to determine a predictive block (P0/P1) using two motion vectors (MV0 and MV1) and two reference figures. (Ref0 and Ref1). The video cryptographer performs a BIO process (BIO 342) on the predictive block to determine a BIO-refined predictive block (P'). The video cryptographer applies an OBMC process (OBMC 344) on the BIO-refined predictive block to determine a final predictive block (Ρ”). In the example in Figure 18, the prediction samples generated from the regular motion compensation process are entered into the BIO 342. The refined motion vector (B-MV) is then entered into the OBMC 344. The MV prediction for subsequent Cus , however, uses the same VM as neighboring VMs without using any BIO VM refinement.
[0154] [0154] Figure 19 shows a simplified BIO illustration using only one MC predictor. In the example of Figure 19, a video cryptographer performs a bi-predictive motion compensation process (MC 350) to determine a predictive block (P) using two motion vectors (MV0 and MV1) and two reference figures (Ref0). and Ref1). The video cryptographer applies an OBMC process (OBMC 352) to the predicted block to determine a motion compensated (Ρ0'/P1') predictive block. The video cryptographer applies a BIO process (BIO 354) to the motion compensated predictive block to determine a final predictive block (P”). When the BIO process is the last stage of a general MC process, as in Figure 19, motion refinement can occur conditionally, which means that the MV predictors for subsequent CUs can be the MV refined by BIO or the MVs used for Bi prediction. The condition can be based on signaling through high-level syntax such as in a VPS, SPS, PPS, slice header, or other such data structure.
[0155] [0155] Figure 20 shows an illustration of BIO derived and applied to MC P0/P1 predictors with parallel processing of an OBMC and BIO. In the example of Figure 20, a video cryptographer performs a bi-predictive motion compensation process (MC 350) to determine a predictive block (P0/P1) using two motion vectors (MV0 and MV1) and two reference figures. (Ref0 and Ref1). The video cryptographer performs a BIO process (BIO 352) on the predictive block to determine a BIO refined predictive block and in parallel performs an OBMC process (OBMC 354) on the predictive block to determine a motion compensated predictive block ( Ρ'). The video cryptographer sums (356) the BIO refined predictive block and the motion compensated predictive block to determine a final predictive block (P”). For BIO operating in parallel with OBMC, as in the example of Figure 20, motion refinement can occur conditionally. In some examples, the condition may be based on block size. For example, when the block size is less than or equal to MxN, then BIO refined motion vectors should be used. Otherwise, the original MV must be used. The values of M and N can be preset or signaled from the encoder to the decoder.
[0156] [0156] According to some existing techniques, BIO and OBMC can be applied sequentially, that is, the OBMC is applied after the BIO offsets are added to the current MC block. To shorten the chain process, in some example techniques, after the MC process is performed, the OBMC process and BIO process can be applied in parallel based on the MC output of the current block. An example of the proposed method is shown in Figure 20, where the final prediction is a weighted average of the OBMC and BIO output. Weighting information can be predefined or flagged. The weighting can also depend on previously encrypted information such as block size, block mode (such as ignore, merge, IC), and motion vectors.
[0157] [0157] When BIO offsets are derived from reference samples, the 6-lead interpolation filter used by existing BIO (as of JEM 6.0) can be replaced with a regular interpolation filter, which means that the interpolation filter used for BIO can be the same interpolation filter used for other interprediction modes. In one example, the existing HEVC interpolation filter can be used to generate the fractional pel prediction samples for gradient calculation.
[0158] [0158] The use of an additional gradient filter can be removed by incorporating a gradient filter (as described above) using the interpolation filter in the regular motion compensation process. In one example, a 4-lead gradient filter can be used with symmetric coefficients {2, -9, 0, 9, -2}. As described above, these values can be signaled in an SPS, PPS, slice header, or other data structure. In one example, video signals with different resolutions may use different sets of filter coefficients. In some examples, filter coefficients can be projected based on the position of fractional pel of the motion vector. Filter coefficients can also be preset based on the above parameters.
[0159] [0159] Figure 21 is a block diagram illustrating an example video encoder 20 that can implement techniques for bidirectional optical streaming. The video encoder 20 can perform intra and inter-encryption of video blocks in the video slices. Intracryptography uses spatial prediction to reduce or remove spatial redundancy in a video within a given video frame or picture. Intercryption uses temporal prediction to reduce or remove temporal redundancy in video in adjacent frames or figures of a video sequence. Intramode (I mode) can refer to any of a number of spatially based encryption modes. Intermodes, such as one-way prediction (P mode) or biprediction (B mode), can refer to any of several time-based encryption modes.
[0160] [0160] As shown in Figure 21, the video encoder 20 receives a current video block in a video frame to be encoded. In the example of Figure 21, the video encoder 20 includes mode selection unit 40, picture reference memory 64 (which may also be referred to as a decoded picture buffer (DPB)), adder 50, picture processing unit 64. transform 52, quantization unit 54, and entropy encoding unit 56. Mode selection unit 40, in turn, includes motion compensation unit 44, motion estimation unit 42, intraprediction unit 46, and partition 48. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform unit 60 and adder 62. An unblock filter (not shown in Figure 21) may also be included to filter block boundaries to remove blocking artifacts from the reconstructed video. If used, the unblock filter would typically filter the output of adder 62. Additional filters (looped or post-loop) can also be used in addition to the unblock filter. Such filters are not shown for the sake of brevity, but if desired, they can filter the output of adder 62 (as a loop filter).
[0161] [0161] During the encoding process, the video encoder 20 receives the video frame or slice to be encrypted. The frame or slice can be split into multiple video blocks. Motion estimation unit 42 and motion compensation unit 44 perform interpredictive coding of the received video block with respect to one or more blocks in one or more reference frames to provide temporal prediction. The intraprediction unit 46 may alternatively intrapredict the received video block using pixels from one or more neighboring blocks in the same frame or slice as the block to be encrypted to provide spatial prediction. The video encoder 20 can perform multiple encryption passes, for example, to select a suitable encryption mode for each block of video data.
[0162] [0162] Furthermore, partition drive 48 can partition video data blocks into sub-blocks, based on the evaluation of previous partition schemes in previous encryption passes. For example, partition unit 48 may initially partition a frame or slice into LCUs, and partition each of the LCUs into sub-CUs based on rate skew analysis (eg rate skew optimization). The mode selection unit 40 may additionally produce a quadtree data structure indicative of the partition of an LCU into sub-CUs. Quadtree leaf node Cus can include one or more PUs and one or more TUs.
[0163] [0163] The mode selection unit 40 can select one of the prediction modes, intra or inter, for example, based on the error results, and provide the resulting predicted block to the adder 50 to generate residual data and to the adder 62 to reconstruct the coded block for use as a reference frame. The mode selection unit 40 also supplies syntax elements, such as motion vectors, intramode pointers, partition information, and other such syntax information, to the entropy encoding unit 56.
[0164] [0164] Motion estimation unit 42 and motion compensation unit 44 may be fully or partially integrated, but are illustrated separately for conceptual purposes. The motion estimation performed by the motion estimation unit 42 is the process of generating motion vectors, which estimates the motion for video blocks. A motion vector, for example, can indicate the deviation of a PU of a video block in a current video or picture frame relative to a predictive block in a reference frame (or other encrypted unit) relative to the current block. which is encrypted on the current frame (or another encrypted drive). A predictive block is a block that is found to be very compatible with the block to be encrypted, in terms of pixel difference, which can be determined by sum of absolute difference (SAD), sum of squared difference (SSD), or other metrics. difference. In some examples, the video encoder 20 may calculate values for pixel positions from sub-integer reference figures stored in the reference figure memory 64. For example, the video encoder 20 may interpolate values of quarter pixel positions. , one-eighth pixel positions, or other fractional pixel positions of the reference figure. Therefore, the motion estimation unit 42 can perform a motion search with respect to full pixel positions and fractional pixel positions and output a motion vector with fractional pixel accuracy.
[0165] [0165] Motion estimation unit 42 calculates a motion vector for a PU of a video block in an intercrypted slice by comparing the position of the PU with the position of a predictive block of a reference picture. The reference figuration can be selected from a first reference figuration list (List 0) or a second reference figuration list (List 1), each of which identifies one or more reference figurations stored in the figuration memory. 64. Motion estimation unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44.
[0166] [0166] Motion compensation, performed by motion compensation unit 44, may involve fetching or generating the predictive block based on the motion vector determined by motion estimating unit 42. Again, motion estimating unit 42 and the motion compensation unit 44 may be functionally integrated, in some examples. Upon receiving the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block to which the motion vector points in one of the reference figure lists. The adder 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block that is encrypted, forming values as discussed below. In general, motion estimation unit 42 performs motion estimation with respect to luma components, and motion compensation unit 44 uses motion vectors calculated based on luma components for both chroma components and luma components. . The mode selection unit 40 may also generate syntax elements associated with the video blocks and the video slice for use by the video decoder 30 in decoding the video blocks of the video slice.
[0167] [0167] Furthermore, the Motion Compensation Unit 44 can be configured to perform any or all of the techniques of this disclosure (alone or in any combination). While discussed in connection with motion compensation unit 44, it should be understood that mode selection unit 40, motion estimation unit 42, partition unit 48 and/or entropy encoding unit 56 can also be configured to performing certain techniques of this disclosure, alone or in combination with the motion compensation unit 44. In one example, the motion compensation unit 44 may be configured to perform the BIO techniques discussed herein.
[0168] [0168] Intraprediction unit 46 can intrapredict a current block, as an alternative to the interprediction performed by motion estimation unit 42 and motion compensation unit 44 as described above. In particular, the intraprediction unit 46 may determine an intraprediction mode to use to encode a current block. In some instances, intraprediction unit 46 may encode a current block using various intraprediction modes, for example during separate encoding passes, and intraprediction unit 46 (or mode selection unit 40, in some examples) can select a suitable intraprediction mode for use from the tested modes. Entropy encoding unit 56 may encode information indicating the selected intraprediction or interprediction mode.
[0169] [0169] The video encoder 20 forms a residual block of video by subtracting the prediction data from the mode selection unit 40 from the original video block that is encrypted. The adder 50 represents the component or components that perform this subtraction operation. The transform processing unit 52 applies a transform, such as a discrete cosine transform.
[0170] [0170] Following quantization, the entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example, entropy encoding unit 56 can perform context-adaptive variable-length encryption (CAVLC), context-adaptive binary arithmetic encryption (CABAC), syntax-based context-adaptive binary arithmetic encryption (SBAC), entropy encryption interval partitioning (PIPE) or another entropy encoding technique. In the case of context-based entropy encoding, the context can be based on neighboring blocks. Following entropy encoding by entropy encoding unit 56, the encoded bit stream may be transmitted to another device (eg, video decoder 30) or archived for later transmission or retrieval.
[0171] [0171] Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain. In particular, the adder 62 adds the reconstructed residual block to the motion compensated prediction block previously produced by the motion compensation unit 44 or intraprediction unit 46 to produce a reconstructed video block for storage in the reference picture memory 64. The reconstructed video block can be used by the motion estimation unit 42 and the motion compensation unit 44 as a reference block to intercode a block in a subsequent video frame.
[0172] [0172] Figure 22 is a block diagram illustrating an example video decoder 30 that can implement techniques for bidirectional optical flow. In the example of Figure 22, the video decoder 30 includes an entropy decoding unit 70, motion compensation unit 72, intraprediction unit 74, inverse quantization unit 76, inverse transform unit 78, reference picture memory 82 and adder
[0173] [0173] During the decoding process, the video decoder 30 receives an encoded video bit stream representing video blocks of an encoded video slice and associated syntax elements from the video encoder 20. The decoding unit entropy 70 of video decoder 30 entropy decodes the bit stream to generate quantized coefficients, motion vectors or intraprediction mode indicators and other syntax elements. Entropy decoding unit 70 forwards motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive video slice-level and/or block-level syntax elements. of video.
[0174] [0174] When the video slice is encrypted as an intra-encrypted slice (I), the intraprediction unit 74 can generate prediction data for a video block of the current video slice based on a signaled intraprediction mode and data from the blocks previously decoded from the current picture or picture.
[0175] [0175] Motion compensation unit 72 determines the prediction information for a video block from the current video slice by analyzing motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current block of video that is decoded. For example, the motion compensation unit 72 uses some of the received syntax elements to determine a prediction mode (e.g., intra or interprediction) used to encode the video blocks of the video slice, a type of interprediction slice ( e.g. slice B, slice P, or slice GPB), construction information for one or more of the reference figuration lists for the slice, motion vectors for each intercoded video block of the slice, interprediction status for each video block intercrypt of the slice, and other information to decode the video blocks in the current video slice.
[0176] [0176] Motion compensation unit 72 can also perform interpolation based on interpolation filters for sub-pixel accuracy. Motion compensation unit 72 can use the interpolation filters used by video encoder 20 when encoding video blocks to calculate interpolation values for sub-integer pixels of reference blocks. In that case, the motion compensation unit 72 can determine the interpolation filters used by the video encoder 20 from the received syntax elements and the use of interpolation filters to produce the predictive blocks.
[0177] [0177] Furthermore, the Motion Compensation Unit 72 can be configured to perform any or all of the techniques of this disclosure (alone or in any combination). For example, motion compensation unit 72 can be configured to perform the BIO techniques discussed herein.
[0178] [0178] The inverse quantization unit 76 inversely quantizes, that is, dequantizes, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit
[0179] [0179] The inverse transform unit 78 applies an inverse transform, for example an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce the residual blocks in the domain of pixel.
[0180] [0180] After the motion compensation unit 72 generates the predictive block for the current block of video based on the motion vectors and other syntax elements, the video decoder 30 forms a decoded video block by summing the residual blocks of the video. inverse transform unit 78 with corresponding predictive blocks generated by motion compensation unit 72. Adder 80 represents the component or components that perform this summing operation. If desired, an unblock filter can also be applied to filter the decoded blocks to remove blocking artifacts. Other loop filters (in the encryption loop or after the encryption loop) can also be used to smooth pixel transitions, or otherwise improve video quality. The video blocks decoded into a given frame or picture are then stored in reference picture memory 82, which stores reference pictures used for subsequent motion compensation. Reference picture memory 82 also stores the decoded video for later presentation on a display device, such as the display device 32 of Figure 1. For example, reference picture memory 82 can store decoded pictures.
[0181] [0181] Figure 23 is a flow diagram illustrating an exemplary video decoding technique described in this disclosure. The techniques of Figure 23 will describe with reference to a video decoder, as without limitation to the video decoder 30. In some cases, the techniques of Figure 23 may be performed by a video encoder such as video encoder 20, in which case the Video decoder corresponds to the video encoder decoding loop.
[0182] [0182] In the example of Figure 23, the video decoder determines that a first block of video data is encoded using an interprediction mode (400). The video decoder performs interpolation filtering using an N-lead filter to generate an interpolated search space (402). N is an integer and corresponds to a number of leads in the N-lead filter. The video decoder obtains a first predictive block for the first block of video data in the interpolated search space (404). The video decoder determines that a second block of video data is encoded using a bidirectional interprediction mode (406). The video decoder determines that the second block of video data is encoded using a BIO process (408). The video decoder performs an interprediction process for the second block of video data using bidirectional interprediction mode to determine a second predictive block (410). The video decoder performs the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block (412). In some examples, a number of reference samples used to calculate intermediate values for BIO deviations may be limited to an integer sample region of (W+N-1)x(H+N-1), where W corresponds to a width of the second block in integer samples and H corresponds to a height of the second block in integer samples.
[0183] [0183] The video decoder can, for example, perform the BIO process for the second block to determine a BIO-refined version of the second predictive block by fetching a block of reference samples that corresponds to the integer sample region of (W+N-1)x(H+N-1), generate an extended reference block with a size of (W+N-1+2S)x(H+N-1+2S) based on the reference block reference samples, where S is a positive integer value, use sample values in the extended reference block, determine one or more BIO offsets, and add the one or more BIO offsets to the block to determine the predictive block refined by BIO.
[0184] [0184] To generate the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the reference sample block, the video decoder can, for example, repeat a top row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1) and repeat a bottom row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1). To generate the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the block of reference samples, the video decoder can additionally or alternatively repeat a left row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1) and repeat a right row of the reference sample block that corresponds to the sample region integer number of (W+N-1)x(H+N-1).
[0185] [0185] In some examples, to generate the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the reference sample block, the video decoder can repeat a top row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1), repeat a bottom row of the reference sample block that corresponds to the integer sample region of (W+N- 1)x(H+N-1), repeat a left row of the reference sample block that corresponds to the integer sample region of (W+N- 1)x(H+N-1), repeat a right row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1), determine values of sample for a top left corner of the extended reference block based on the sample values of the top repeated row and sample values of the left repeated row, determine sample values for a right corner of top of the extended reference block based on the sample values of the repeated top row and sample values of the right repeated row, determine sample values for a bottom left corner of the extended reference block based on the sample values of the extended reference block repeated bottom and repeated left row sample values, and determining sample values for a bottom right corner of the extended reference block based on the repeated top row sample values and repeated bottom row sample values.
[0186] [0186] As illustrated above in the examples of Figures 11, 13 to 18 and 20 to 23, in some examples, the video decoder may apply an OBMC process to the second predictive block before performing the BIO process for the second block, or the video decoder can apply an OBMC process after applying the BIO process. If the video decoder applies the OBMC process after applying the BIO process, the video decoder can apply the OBMC process to the BIO-refined predictive block.
[0187] [0187] Video decoder outputs BIO-refined version of second predictive block (414). The BIO-refined predictive block may undergo further processing, such as an OBMC process and/or one or more loop filters, before being emitted. In cases where the video decoder is part of a video encoder, then the video decoder can output the BIO-refined predictive block by storing a decoded picture that includes the BIO-refined predictive block in a temporary storage area of decoded picture for use as the reference picture in encoding subsequent pictures of video data. On occasions when the video decoder is decoding the video data for display, then the video decoder can output the BIO refined predictive block by storing a decoded picture that includes the BIO refined predictive block in a temporary storage area of decoded picture for use as a reference picture in decoding subsequent pictures of video data and outputting the decoded picture that includes the BIO-refined predictive block, possibly after further processing such as after applying one or more loop filters to the BIO-refined predictive block, for a display device.
[0188] [0188] It should be recognized that depending on the example, certain actions or events of any of the techniques described in this document may be performed in a different sequence, may be added, merged, or forgotten all together (e.g. not all actions or events described are necessary for the practice of the techniques). Also, in certain instances, actions or events may be performed concurrently, for example, through multi-thread processing, interrupt processing, or multiple processors, rather than sequentially.
[0189] [0189] In one or more examples, the functions described may be implemented in hardware, software, firmware or any combination thereof. If implemented in software, functions can be stored or transmitted, as one or more instructions or code, on a computer-readable medium and executed by a hardware-based processing unit. Computer readable media may include computer readable storage media, which is a tangible medium such as data storage media, or communication media which includes any medium that facilitates the transfer of a computer program from one place to another, for example, according to a communication protocol. In this way, computer-readable media can generally correspond to (1) tangible computer-readable storage media that are non-transient or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include computer readable media.
[0190] [0190] By way of example, and without limitation, such computer readable storage media may be any of RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices , flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer or any combination thereof. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared , radio and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to storage media. tangible non-transient. Magnetic disc and optical disc as used herein include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disc and Blu-ray disc, on which magnetic discs normally play magnetically, while optical discs reproduce data optically with lasers. Combinations of the foregoing should also fall within the scope of computer readable media.
[0191] [0191] Instructions may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs or other equivalent discrete or integrated logic circuitry. Accordingly, the term "processor" as used herein may refer to any of the foregoing frameworks or any other framework suitable for implementing the techniques described herein. Additionally, in some respects, the functionality described in this document may be provided in dedicated software and/or hardware modules configured to encode and decode, or incorporated into a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
[0192] [0192] The techniques of this disclosure can be deployed in a wide variety of devices or appliances, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chipset). Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization through different hardware units. Instead, as described above, multiple units may be combined into one codec hardware unit or provided by a set of interoperable hardware units, which include one or more processors as described above, together with the software and/or the proper firmware.
[0193] [0193] Several examples have been described. These and other examples are within the scope of the claims that follow.
权利要求:
Claims (27)
[1]
1. Video data decoding method, the method comprising: determining that a first block of video data is encoded using an interprediction mode; performing interpolation filtering using an N-lead filter to generate an interpolated search space, where N is an integer and corresponds to a lead number in the N-lead filter; obtaining a first predictive block for the first block of video data in the interpolated search space; determining that a second block of video data is encoded using a bidirectional interprediction mode; determining that the second block of video data is encoded using a bidirectional optical stream (BIO) process; performing an interprediction process for the second block of video data using the bidirectional interprediction mode to determine a second predictive block; perform the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block, where a number of reference samples used to calculate intermediate values for BIO deviations is limited to an integer sample region of ( W+N-1)x(H+N-1), where W corresponds to a width of the second block in integer samples, and H corresponds to a height of the second block in integer samples; and output the BIO-refined version of the second predictive block.
[2]
A method according to claim 1, wherein performing the BIO process for the second block to determine the BIO-refined version of the second predictive block comprises: searching for a reference sample block that corresponds to the sample region of integer of (W+N-1)x(H+N-1); generate an extended reference block with a size of (W+N-1+2S)x(H+N-1+2S) based on the block of reference samples that corresponds to the region of (W+N- 1)x (H+N-1), where S is a positive integer value; use the sample values in the extended reference block, determining one or more BIO offsets; and adding the one or more BIO offsets to the second predictive block to determine the BIO-refined version of the second predictive block.
[3]
A method according to claim 2, wherein generating the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the sample block of reference comprises: repeating a top row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); and repeating a bottom row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1).
[4]
A method according to claim 2, wherein generating the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the sample block of reference comprises: repeating a left row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); repeat a right row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1).
[5]
A method according to claim 2, wherein generating the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the sample block of reference comprises: repeating a top row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); repeating a bottom row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); repeating a left row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); repeating a right row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); determining sample values for a top left corner of the extended reference block based on the top repeated row sample values and repeated left row sample values;
determining sample values for a top right corner of the extended reference block based on the top repeated row sample values and repeated right row sample values; determining sample values for a bottom left corner of the extended reference block based on the repeated bottom row sample values and repeated left row sample values; determine sample values for a bottom right corner of the extended reference block based on the sample values of the top repeated row and sample values of the bottom repeated row.
[6]
The method of claim 1, further comprising: applying an Overlaid Block Motion Compensation (OBMC) process to the second predictive block before performing the BIO process for the second block.
[7]
The method of claim 1, further comprising: applying an Overlaid Block Motion Compensation (OBMC) process to the BIO refined predictive block.
[8]
The method of claim 1, wherein the method for decoding the video data is performed as part of a reconstruction loop of a video encoding process.
[9]
9. Device for decoding video data, the device comprising: a memory configured to store the video data; and one or more processors configured to: determine that a first block of video data is encoded using an interprediction mode; performing interpolation filtering using an N-lead filter to generate an interpolated search space, where N is an integer and corresponds to a lead number in the N-lead filter; obtaining a first predictive block for the first block of video data in the interpolated search space; determining that a second block of video data is encoded using a bidirectional interprediction mode; determining that the second block of video data is encoded using a bidirectional optical stream (BIO) process; performing an interprediction process for the second block of video data using the bidirectional interprediction mode to determine a second predictive block; perform the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block, where a number of reference samples used to calculate intermediate values for BIO deviations is limited to an integer sample region of ( W+N-1)x(H+N-1), where W corresponds to a width of the second block in integer samples, and H corresponds to a height of the second block in integer samples; and output the BIO-refined version of the second predictive block.
[10]
Device according to claim 9, wherein to perform the BIO process for the second block to determine the BIO-refined version of the second predictive block, the one or more processors are additionally configured to: fetch a block of samples from reference corresponding to the region of integer samples of (W+N-1)x(H+N-1); generate an extended reference block with a size of (W+N-1+2S)x(H+N-1+2S) based on the block of reference samples that corresponds to the region of (W+N- 1)x (H+N-1), where S is a positive integer value; use sample values in the extended reference block, determining one or more BIO offsets; and adding the one or more BIO offsets to the second predictive block to determine the BIO-refined version of the second predictive block.
[11]
Device according to claim 10, wherein to generate the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the sample block of reference, the one or more processors are further configured to: repeat a top row of the block of reference samples that corresponds to the integer sample region of (W+N-1)x(H+N-1); and repeating a bottom row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1);
[12]
Device according to claim 10, wherein to generate the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the sample block of reference, the one or more processors are further configured to: repeat a left row of the block of reference samples that corresponds to the integer sample region of (W+N-1)x(H+N-1); repeat a right row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1).
[13]
Device according to claim 10, wherein to generate the extended reference block of the size of (W+N-1+2S)x(H+N-1+2S) based on the sample block of reference, the one or more processors are further configured to: repeat a top row of the block of reference samples that corresponds to the integer sample region of (W+N-1)x(H+N-1); repeating a bottom row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); repeating a left row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); repeating a right row of the reference sample block corresponding to the integer sample region of (W+N-1)x(H+N-1); determining sample values for a top left corner of the extended reference block based on the top repeated row sample values and repeated left row sample values; determining sample values for a top right corner of the extended reference block based on the top repeated row sample values and repeated right row sample values; determining sample values for a bottom left corner of the extended reference block based on the repeated bottom row sample values and repeated left row sample values; determine sample values for a bottom right corner of the extended reference block based on the sample values of the top repeated row and sample values of the bottom repeated row.
[14]
The device of claim 9, wherein the one or more processors are further configured to: apply an Overlaid Block Motion Compensation (OBMC) process to the second predictive block before performing the BIO process to the second block.
[15]
The device of claim 9, wherein the one or more processors are further configured to: apply an Overlaid Block Motion Compensation (OBMC) process to the BIO-refined predictive block.
[16]
The device of claim 9, wherein the device for decoding video data comprises a device for encoding video data which performs video decoding as part of a reconstruction loop of a video encoding process.
[17]
The device of claim 9, wherein the device comprises a wireless communication device, which further comprises a receiver configured to receive encoded video data.
[18]
The device of claim 17, wherein the wireless communication device comprises a telephone handset and wherein the receiver is configured to demodulate, in accordance with a wireless communication standard, a signal comprising the data. encoded video.
[19]
The device of claim 9, wherein the device comprises a wireless communication device, which further comprises a transmitter configured to transmit encoded video data.
[20]
The device of claim 19, wherein the wireless communication device comprises a telephone handset and wherein the transmitter is configured to modulate, in accordance with a wireless communication standard, a signal comprising the data. encoded video.
[21]
21. Computer-readable storage medium that stores instructions that when executed by one or more processors cause the one or more processors to: determine that a first block of video data is encoded using an interprediction mode; performing interpolation filtering using an N-lead filter to generate an interpolated search space, where N is an integer and corresponds to a lead number in the N-lead filter; obtaining a first predictive block for the first block of video data in the interpolated search space; determining that a second block of video data is encoded using a bidirectional interprediction mode; determining that the second block of video data is encoded using a bidirectional optical stream (BIO) process; performing an interprediction process for the second block of video data using the bidirectional interprediction mode to determine a second predictive block; perform the BIO process on the second predictive block to determine a BIO-refined version of the second predictive block, where a number of reference samples used to calculate intermediate values for BIO deviations is limited to an integer sample region of ( W+N-1)x(H+N-1), where W corresponds to a width of the second block in integer samples, and H corresponds to a height of the second block in integer samples; and output the BIO-refined version of the second predictive block.
[22]
The computer readable medium of claim 21, wherein to perform the BIO process for the second block to determine the BIO-refined version of the second predictive block, the instructions cause the one or more processors to:
fetch a block of reference samples that corresponds to the integer sample region of (W+N-1)x(H+N-1); generate an extended reference block with a size of (W+N-1+2S)x(H+N-1+2S) based on the block of reference samples that corresponds to the region of (W+N- 1)x (H+N-1), where S is a positive integer value; use sample values in the extended reference block, determining one or more BIO offsets; and add the one or more BIO offsets to the second predictive block to determine the BIO-refined version of the second predictive block.
[23]
A computer readable medium as claimed in claim 22, wherein to generate the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the block of reference samples, the instructions cause the one or more processors to: repeat a top row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N- 1); and repeat a bottom row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1);
[24]
A computer readable medium as claimed in claim 22, wherein to generate the extended reference block of the size of (W+N-1+2S)x(H+N-1+2S) based on the block of reference samples, the instructions cause the one or more processors to: repeat a left row of the reference samples block that corresponds to the integer sample region of (W+N-1)x(H+N-1 ); repeat a right row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1).
[25]
A computer readable medium as claimed in claim 22, wherein to generate the extended reference block with the size of (W+N-1+2S)x(H+N-1+2S) based on the block of reference samples, the instructions cause the one or more processors to: repeat a top row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N- 1); repeat a bottom row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1); repeat a left row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1); repeat a right row of the reference sample block that corresponds to the integer sample region of (W+N-1)x(H+N-1); determine sample values for a top left corner of the extended reference block based on the top repeated row sample values and repeated left row sample values; determining sample values for a top right corner of the extended reference block based on the top repeated row sample values and repeated right row sample values;
determine sample values for a bottom left corner of the extended reference block based on the repeated bottom row sample values and repeated left row sample values; determine sample values for a bottom right corner of the extended reference block based on the sample values of the top repeated row and sample values of the bottom repeated row.
[26]
A computer readable medium as claimed in claim 21, wherein the instructions cause the one or more processors to: apply an Overlaid Block Motion Compensation (OBMC) process to the second predictive block prior to performing the BIO for the second block.
[27]
The computer readable medium of claim 21, wherein the instructions cause the one or more processors to: apply an Overlapping Block Motion Compensation (OBMC) process to the BIO-refined predictive block.
类似技术:
公开号 | 公开日 | 专利标题
BR112019026775A2|2020-06-30|Effective design for memory bandwidth for bidirectional optical | streaming
US10523964B2|2019-12-31|Inter prediction refinement based on bi-directional optical flow |
JP2020503799A|2020-01-30|Motion vector reconstruction for bidirectional optical flow |
RU2705428C2|2019-11-07|Outputting motion information for sub-blocks during video coding
KR102136973B1|2020-07-23|Enhanced bidirectional optical flow for video coding
BR112019019210A2|2020-04-14|restriction motion vector information derived by decoder side motion vector derivation
BR112021005357A2|2021-06-15|improvements to history-based motion vector predictor
BR112019027821A2|2020-07-07|template pairing based on partial reconstruction for motion vector derivation
TWI736872B|2021-08-21|Limitation of the mvp derivation based on decoder-side motion vector derivation
BR112020014522A2|2020-12-08|IMPROVED DERIVATION OF MOTION VECTOR ON THE DECODER SIDE
BR112021001563A2|2021-04-20|method and inter prediction apparatus
同族专利:
公开号 | 公开日
AU2018288866A1|2019-12-05|
US20180376166A1|2018-12-27|
EP3643066A1|2020-04-29|
US10904565B2|2021-01-26|
KR20200020722A|2020-02-26|
SG11201910399PA|2020-01-30|
CN110754087A|2020-02-04|
WO2018237303A1|2018-12-27|
EP3643066B1|2021-09-01|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

AU2003246987A1|2002-07-09|2004-01-23|Nokia Corporation|Method and system for selecting interpolation filter type in video coding|
CN100469142C|2003-08-05|2009-03-11|Nxp股份有限公司|Video encoding and decoding methods and corresponding devices|
MX2013010231A|2011-04-12|2013-10-25|Panasonic Corp|Motion-video encoding method, motion-video encoding apparatus, motion-video decoding method, motion-video decoding apparatus, and motion-video encoding/decoding apparatus.|
EP3332551A4|2015-09-02|2019-01-16|MediaTek Inc.|Method and apparatus of motion compensation for video coding based on bi prediction optical flow techniques|
US20180192071A1|2017-01-05|2018-07-05|Mediatek Inc.|Decoder-side motion vector restoration for video coding|EP3435673A4|2016-03-24|2019-12-25|Intellectual Discovery Co., Ltd.|Method and apparatus for encoding/decoding video signal|
JPWO2019003993A1|2017-06-26|2019-12-26|パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America|Encoding device, decoding device, encoding method and decoding method|
US10841610B2|2017-10-23|2020-11-17|Avago Technologies International Sales Pte. Limited|Block size dependent interpolation filter selection and mapping|
KR20200096917A|2017-12-08|2020-08-14|파나소닉 인텔렉츄얼 프로퍼티 코포레이션 오브 아메리카|Image encoding device, image decoding device, image encoding method and image decoding method|
US20190238883A1|2018-01-26|2019-08-01|Mediatek Inc.|Hardware Friendly Constrained Motion Vector Refinement|
WO2019234600A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Interaction between pairwise average merging candidates and intra-block copy |
TWI739120B|2018-06-21|2021-09-11|大陸商北京字節跳動網絡技術有限公司|Unified constrains for the merge affine mode and the non-merge affine mode|
GB2589223A|2018-06-21|2021-05-26|Beijing Bytedance Network Tech Co Ltd|Component-dependent sub-block dividing|
WO2020065518A1|2018-09-24|2020-04-02|Beijing Bytedance Network Technology Co., Ltd.|Bi-prediction with weights in video coding and decoding|
CN112956202A|2018-11-06|2021-06-11|北京字节跳动网络技术有限公司|Extension of inter prediction with geometric partitioning|
JP2020113923A|2019-01-15|2020-07-27|富士通株式会社|Moving picture coding program and moving picture coding device|
US11178414B2|2019-02-27|2021-11-16|Mediatek Inc.|Classification for multiple merge tools|
CN113597769A|2019-03-19|2021-11-02|华为技术有限公司|Video inter-frame prediction based on optical flow|
CN113661708A|2019-04-02|2021-11-16|北京字节跳动网络技术有限公司|Video encoding and decoding based on bidirectional optical flow|
EP3922015A1|2019-04-19|2021-12-15|Beijing Bytedance Network Technology Co. Ltd.|Gradient calculation in different motion vector refinements|
WO2020255903A1|2019-06-21|2020-12-24|パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ|Coding device, decoding device, coding method, and decoding method|
WO2021055643A1|2019-09-17|2021-03-25|Beijing Dajia Internet Information Technology Co., Ltd.|Methods and apparatus for prediction refinement with optical flow|
WO2021054886A1|2019-09-20|2021-03-25|Telefonaktiebolaget Lm Ericsson |Methods of video encoding and/or decoding with bidirectional optical flow simplification on shift operations and related apparatus|
WO2021061322A1|2019-09-24|2021-04-01|Alibaba Group Holding Limited|Motion compensation methods for video coding|
法律状态:
2021-10-05| B11A| Dismissal acc. art.33 of ipl - examination not requested within 36 months of filing|
2021-11-03| B350| Update of information on the portal [chapter 15.35 patent gazette]|
2021-12-21| B11Y| Definitive dismissal - extension of time limit for request of examination expired [chapter 11.1.1 patent gazette]|
优先权:
申请号 | 申请日 | 专利标题
US201762524398P| true| 2017-06-23|2017-06-23|
US62/524,398|2017-06-23|
US16/015,046|US10904565B2|2017-06-23|2018-06-21|Memory-bandwidth-efficient design for bi-directional optical flow |
US16/015,046|2018-06-21|
PCT/US2018/039065|WO2018237303A1|2017-06-23|2018-06-22|A memory-bandwidth-efficient design for bi-directional optical flow |
[返回顶部]